1 Introduction

Use CTRL/CMD + Shift + k to preview your markdown. Hit the Visual button, or use CTRL/CMD + Shift + F4 to switch to visual mode, which will let you edit the formatted version in real time.

At the top of the .Rmd file is the YAML (Yet Another Markup Language) header. It is a human-readable data serialization language. It sets some options for your markdown and gives you a nicely formatted preamble. It is currently arranged with some of my preferred settings, but feel free to play around with this and make it your own.

We have it set here to output as html, but you can just as easily produce PDF or Word documents. Note that some outputs may come out differently (or not at all) when we render to different formats. There are a bunch of built-in themes that you can explore here.

Below the YAML header in the .Rmd document you will find the first “code chunk”. You will notice that this one does not appear in the rendered document - this is because it is the setup chunk, and it has the setting include=FALSE. We use this setup chunk to set options for chunk behavior, as well as loading packages and data and such. Note that the markdown will run in a totally separate environment, so you have to load all your data and packages within the .Rmd file.

2 Syntax

The # above makes a header. A single # is the largest header, and extra #s are smaller headers.

2.1 Smaller Header

This header is automatically numbered because of the YAML settings and the the double #.

2.1.1 Even Smaller Header

This is three #s.

2.1.1.1 Super Tiny Header

This is four #s. Note that it does not show up in the table of contents because we only asked it to keep track of the first three levels.

2.2 More Syntax

Use a single asterisk to make font italic.

Use double asterisks to make font bold.

Note that you need a blank line between paragraphs to split up text. Starting on a new line is not enough.

To make bullet points, use -

  • A thing
  • Another thing
    • Sub thing

To make numbered lists, use 1.

  1. First thing
  2. Second thing
    • Sub thing

To put code in-line, use back ticks (``)

For multiple lines of verbatim code, use triple back ticks.
x + 1 = y

To make block quotes, use > at the start of the line.

3 Code Chunks

Here we will explore some proper code chunks. You can use CTRL/CMD + ALT + I to create a new chunk. After the r comes the chunk name. This is not required, but is convenient if we hit an error because it will tell us the name of the chunk where the error was. Otherwise, it will just say “error in chunk 14” or some such.

We will be using data from Schneider et al. 2024. The code and data are available on a GitHub repository. While we’re at it, all the scripts and data for this course are available in a repository as well. Let’s start by cleaning up our data a little bit in this first code chunk:

# Remove ampersands
fsci$FSCI_region <- gsub('&', 'and', fsci$FSCI_region)

# Reduce to one variable
df <- fsci[fsci$short_label == 'Prevalence of undernourishment', ]

If we want to run our code but not show the code block, we can set the echo=FALSEoption in the chunk header. Otherwise, our code chunk will be visible. Let’s show off our example regression from the FSCI paper.

## 
## Call:
## lm(formula = normvalue ~ year + FSCI_region, data = df, weights = weight)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -8790.7  -724.6   -96.6   825.2  9701.4 
## 
## Coefficients:
##                                              Estimate Std. Error t value
## (Intercept)                                 982.84949   71.45042  13.756
## year                                         -0.48328    0.03555 -13.593
## FSCI_regionEastern Asia                      -5.96480    2.32473  -2.566
## FSCI_regionLatin America and Caribbean       -3.30360    2.34519  -1.409
## FSCI_regionNorthern Africa and Western Asia  -1.39026    2.38571  -0.583
## FSCI_regionNorthern America and Europe       -9.92333    2.89919  -3.423
## FSCI_regionOceania                           20.06159    5.13223   3.909
## FSCI_regionSouth-eastern Asia                 1.88444    2.33252   0.808
## FSCI_regionSouthern Asia                      6.89466    2.28244   3.021
## FSCI_regionSub-Saharan Africa                15.69689    2.31040   6.794
##                                                         Pr(>|t|)    
## (Intercept)                                 < 0.0000000000000002 ***
## year                                        < 0.0000000000000002 ***
## FSCI_regionEastern Asia                                  0.01035 *  
## FSCI_regionLatin America and Caribbean                   0.15906    
## FSCI_regionNorthern Africa and Western Asia              0.56012    
## FSCI_regionNorthern America and Europe                   0.00063 ***
## FSCI_regionOceania                               0.0000951814320 ***
## FSCI_regionSouth-eastern Asia                            0.41923    
## FSCI_regionSouthern Asia                                 0.00255 ** 
## FSCI_regionSub-Saharan Africa                    0.0000000000136 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2094 on 2484 degrees of freedom
## Multiple R-squared:  0.3246, Adjusted R-squared:  0.3221 
## F-statistic: 132.6 on 9 and 2484 DF,  p-value: < 0.00000000000000022

This shows our output, but not the code chunk.

We can see our regression output much like we do when we run it in a script, but it is not terribly nice to look at here.

4 Regression Outputs

4.1 Kable

To get a cleaner output, we can convert our regression results to a data frame, then use knitr::kable() to create a nice looking table.

lm_df <- broom::tidy(lm)
knitr::kable(lm_df)
term estimate std.error statistic p.value
(Intercept) 982.8494947 71.4504180 13.7556857 0.0000000
year -0.4832786 0.0355529 -13.5932126 0.0000000
FSCI_regionEastern Asia -5.9647972 2.3247336 -2.5657982 0.0103520
FSCI_regionLatin America and Caribbean -3.3036003 2.3451881 -1.4086718 0.1590574
FSCI_regionNorthern Africa and Western Asia -1.3902648 2.3857103 -0.5827467 0.5601167
FSCI_regionNorthern America and Europe -9.9233258 2.8991908 -3.4227915 0.0006299
FSCI_regionOceania 20.0615880 5.1322298 3.9089419 0.0000952
FSCI_regionSouth-eastern Asia 1.8844396 2.3325248 0.8078969 0.4192273
FSCI_regionSouthern Asia 6.8946569 2.2824376 3.0207428 0.0025473
FSCI_regionSub-Saharan Africa 15.6968937 2.3103971 6.7940243 0.0000000

We can take some extra steps to get the column names capitalized and the numbers rounded:

lm_df_cleaner <- lm_df %>% 
  dplyr::mutate(across(where(is.numeric), ~ round(.x, 3))) %>% 
  setNames(c(snakecase::to_title_case(names(.))))
knitr::kable(lm_df_cleaner)
Term Estimate Std Error Statistic P Value
(Intercept) 982.849 71.450 13.756 0.000
year -0.483 0.036 -13.593 0.000
FSCI_regionEastern Asia -5.965 2.325 -2.566 0.010
FSCI_regionLatin America and Caribbean -3.304 2.345 -1.409 0.159
FSCI_regionNorthern Africa and Western Asia -1.390 2.386 -0.583 0.560
FSCI_regionNorthern America and Europe -9.923 2.899 -3.423 0.001
FSCI_regionOceania 20.062 5.132 3.909 0.000
FSCI_regionSouth-eastern Asia 1.884 2.333 0.808 0.419
FSCI_regionSouthern Asia 6.895 2.282 3.021 0.003
FSCI_regionSub-Saharan Africa 15.697 2.310 6.794 0.000

There are many more options available in the kable and kableExtra packages for building static tables. This is probably the most powerful set of table packages I’ve found. See the docs for examples. This is where you really learn how to use a package. It is written by the author, with abundant vignettes and examples.

4.2 sjPlot

For a very clean regression table with less work, try the sjPlot package:

sjPlot::tab_model(
  lm, 
  p.style = 'numeric',
  digits = 3,
  show.se = TRUE,
  robust = TRUE,
  show.reflvl = TRUE,
  dv.labels = 'Undernourishment',
  pred.labels = gsub("FSCI_region", "", names(coef(lm)))
)
  Undernourishment
Predictors Estimates std. Error CI p
(Intercept) 982.849 115.808 755.760 – 1209.939 <0.001
year -0.483 0.058 -0.596 – -0.370 <0.001
Eastern Asia -5.965 1.695 -9.289 – -2.641 <0.001
Latin America and Caribbean -3.304 1.516 -6.276 – -0.332 0.029
Northern Africa and Western Asia -1.390 1.616 -4.559 – 1.779 0.390
Northern America and Europe -9.923 1.721 -13.297 – -6.549 <0.001
Oceania 20.062 1.953 16.232 – 23.891 <0.001
South-eastern Asia 1.884 1.503 -1.063 – 4.832 0.210
Southern Asia 6.895 1.502 3.949 – 9.840 <0.001
Sub-Saharan Africa 15.697 1.712 12.340 – 19.054 <0.001
Observations 2494
R2 / R2 adjusted 0.325 / 0.322

Note that this function takes the lm object as an input, not a data frame. It is designed to work with regression models and provides a ton of options for displaying them. Check out the documentation here.

A curious hiccup with this package is that the show.fstat argument does not work. If you want to see why, check out the code behind the function. You can do this either by placing the cursor on the function and hitting F2 or by using CTRL/CMD + left click on the function.

4.3 stargazer

The stargazer package is quite popular in econometrics. You can find a nice tutorial here, or a quick paper and demo arguing why you should use it from the author here. It’s a pretty nice package for easily displaying regressions in LaTeX, but I wouldn’t personally recommend it for non-LaTeX applications.

Here we will put two models together in the same table:

# Get another regression
df2 <- fsci[fsci$short_label == 'Access to safe water', ]
lm2 <- lm(normvalue ~ year + FSCI_region, data = df2, weights = weight)

stargazer::stargazer(
  lm,
  lm2,
  type = 'html',
  font.size = 'footnotesize',
  column.labels = c('Undernourishment', 'Safe Water'),
  dep.var.labels.include = FALSE,
  covariate.labels = gsub("FSCI_region", "", names(coef(lm)))
)
Dependent variable:
Undernourishment Safe Water
(1) (2)
(Intercept) -0.483*** 0.683***
(0.036) (0.038)
year -5.965** 33.607***
(2.325) (2.514)
Eastern Asia -3.304 -8.328***
(2.345) (2.623)
Latin America and Caribbean -1.390 -5.368*
(2.386) (2.910)
Northern Africa and Western Asia -9.923*** 33.053***
(2.899) (2.536)
Northern America and Europe 20.062*** 35.413***
(5.132) (4.446)
Oceania 1.884 -26.648***
(2.333) (2.606)
South-eastern Asia 6.895*** -12.534***
(2.282) (2.508)
Southern Asia 15.697*** -40.530***
(2.310) (2.582)
Sub-Saharan Africa 982.849*** -1,315.970***
(71.450) (75.950)
Observations 2,494 3,106
R2 0.325 0.796
Adjusted R2 0.322 0.795
Residual Std. Error 2,094.342 (df = 2484) 2,995.163 (df = 3096)
F Statistic 132.634*** (df = 9; 2484) 1,342.004*** (df = 9; 3096)
Note: p<0.1; p<0.05; p<0.01

Note that we put the results='asis' option into the chunk header. This is how we can get latex to show up properly as an html markdown.

5 Interactive Tables

We’ve already seen how to make tables above. For static tables, knitr::kable() is a good choice.

For interactive tables, there are a couple of different options.

5.1 DT

The DT package is a classic choice for interactive tables. Note that we are setting echo=FALSE here, so the code chunk will not be visible.

5.2 Reactable

My personal favorite for interactive tables is reactable. The documentation is excellent, so check it out if you’re interested.

reactable::reactable(
  data = gapminder,
  filterable = TRUE,
  searchable = TRUE,
  outlined = TRUE,
  bordered = TRUE,
  compact = TRUE,
  striped = TRUE,
  showPageSizeOptions = TRUE
)

I find the options for customization here much more intuitive than DT, and the documentation is much easier to use.

6 Plots

An excellent reference for graphs is the R Graph Gallery, which has lots of examples to explore and accompanying code for each figure.

6.1 Static Plots

We haven’t really covered plots, but you really just throw your code in the chunk and it will appear.

6.1.1 Base Plot

Base plots with the plot() function are available in base R. It can do just about anything. I personally find that it works great for simple plots, but more elaborate and pretty plots take more work.

# Filter gapminder to the year 2007 only
gapminder_2007 <- gapminder[gapminder$year == 2007, ]

# Plot gapminder data
plot(
  x = gapminder_2007$gdpPercap,
  y = gapminder_2007$lifeExp,
  col = gapminder_2007$continent,
  pch = 16,
  cex = sqrt(gapminder_2007$pop) / 10000,
  ylab = 'Life Expectancy',
  xlab = 'GDP per Capita',
  main = 'Life Expectancy against GDP per Capita (2007)'
)

# Add a legend to the plot above
legend(
  "bottomright", 
  legend = levels(gapminder_2007$continent),
  col = 1:5, 
  pch = 16, 
  title = "Continent"
)

6.1.2 ggplot2

The ggplot2 package is one of the biggest strengths of R in my opinion. It is an excellent package for making pretty plots easily, with tons of extensions and extra packages for applications in mapping, chord diagrams, dendrograms, animations, etc. For a nice gallery of R graphs including example code, check out the R Graph Gallery.

# Save this plot to an object so we can use it again later
gapminder_static <- gapminder %>% 
  dplyr::filter(year == 2007) %>% 
  ggplot2::ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
  geom_point() +
  theme_classic() +
  labs(
    x = 'GDP per Capita',
    y = 'Life Expectancy',
    title = 'Life Expectancy against GDP per Capita (2007)'
  )

# Show plot created above
gapminder_static

We can change the alignment, size, and resolution of our plot in the chunk options:

# Show same plot from last chunk but with different settings
gapminder_static
Caption Goes Here

Caption Goes Here

6.2 Interactive Plots

What about an interactive plot? We can use the very popular plotly package to do this. It is native to python, but the plotly R package gives us an easy way to access it. It has its own syntax, but you can also use the ggplotly() function to convert a ggplot object to a plotly object.

# This time we'll save the plot to an object that we can call later
gapminder_interactive <- gapminder %>%
  dplyr::filter(year == 2007) %>%
  ggplot2::ggplot(aes(
    x = gdpPercap,
    y = lifeExp,
    color = continent,
    size = pop,
    text = paste0(
      'Country: ', country, '\n',
      'Continent: ', continent, '\n',
      'GDP per capita: $',
      stringr::str_squish(format(round(gdpPercap, 0), big.mark = ',')), '\n',
      'Life Exp: ', round(lifeExp, 1), ' years\n',
      'Population: ', format(pop, big.mark = ',')
    )
  )) +
  geom_point() +
  theme_classic() +
  labs(
    x = 'GDP per Capita',
    y = 'Life Expectancy',
    title = 'Life Expectancy against GDP per Capita (2007)'
  )

# Use ggplotly function on the plot object we made above
plotly::ggplotly(gapminder_interactive, tooltip = 'text')

Note that you can hover over points to see more information from out text field, and also move, zoom, select, and download a static image of the plot.

7 Optional Resources

Great quick references:

If you want to dive deeper: